ScoreHero
Home | Forum | Wiki
Inbox [ Login ]Inbox [ Login ]
SearchSearch MemberlistMemberlist
ProfileProfile Log inLog in
The Programming Thread / Project Euler Thread
Goto page Previous  1, 2, 3 ... 16, 17, 18 ... 28, 29, 30  Next
 
Post new topic   Reply to topic    ScoreHero Forum Index -> General Chat
View previous topic :: View next topic  
Author Message
bclare  





Joined: 21 Jun 2008
Posts: 6048
Location: Boston

PostPosted: Fri Feb 18, 2011 6:25 pm    Post subject: Reply with quote

Robbert wrote:
Have you tried a digram and trigram analysis yet? It may shed more light than just the single letters.


This. Specifically, looking at double letters could help a lot. Just glancing at the code, I'm seeing AA, BB and MM at least once each. Since there's no spacing in your code, that might just be one word ending in a letter and the next word starting with the same letter, but if it's all in one word that pretty much narrows the choices down to O, E, T, L, R, and maybe M or B (with other doubled letters possible but quite uncommon). If any of the double letters are very common, that almost guarantees that letter is E, or possibly O.

The other most useful form of digram analysis is finding commonly paired letters, for example if you find XY and ZY both occuring a lot, then a good guess would be that X/Z = T/W and Y = H.

I might take a look at this later when I'm out of work. I won't tell you the answer if I get it, but I could definitely give you a few good hints. I've actually got a little book with a bunch of lists of frequencies of letters and digrams and trigrams in english, plus common patterns.
_________________
I'm back I suppose
Back to top
View user's profile Wiki User Page Send private message XBL Gamertag: bclare PSN Name: bclare1729
FBMrider86  





Joined: 23 Jan 2007
Posts: 1679
Location: Lawrenceburg, KY <- And I ain't havin' no fun

PostPosted: Fri Feb 18, 2011 7:08 pm    Post subject: Reply with quote

bclare wrote:
Robbert wrote:
Have you tried a digram and trigram analysis yet? It may shed more light than just the single letters.


This. Specifically, looking at double letters could help a lot. Just glancing at the code, I'm seeing AA, BB and MM at least once each. Since there's no spacing in your code, that might just be one word ending in a letter and the next word starting with the same letter, but if it's all in one word that pretty much narrows the choices down to O, E, T, L, R, and maybe M or B (with other doubled letters possible but quite uncommon). If any of the double letters are very common, that almost guarantees that letter is E, or possibly O.

The other most useful form of digram analysis is finding commonly paired letters, for example if you find XY and ZY both occuring a lot, then a good guess would be that X/Z = T/W and Y = H.

I might take a look at this later when I'm out of work. I won't tell you the answer if I get it, but I could definitely give you a few good hints. I've actually got a little book with a bunch of lists of frequencies of letters and digrams and trigrams in english, plus common patterns.


I've been reworking the program to include bigrams. Trigrams were also suggested by my professor so I guess can implement those too. At this point I'm just trying to get everything readable (OCD programming?) and sorted (sorting 2d arrays of Strings took me a bit to figure out). Once that happens I'll report back.
_________________
Quote:
I think my neighbors have figured out my address. Should I move or kill them?

Custom Guitar Paint Jobs (Submit Yours!)
Proud supporter of EWiggen's Grilled Cheese and GH Combo
socrstopr wrote:
This thread is very disturburbing. Also terrible.
Back to top
View user's profile Send private message MSN Messenger XBL Gamertag: Head0nFire86
footballtom3685  





Joined: 16 Sep 2007
Posts: 2478
Location: Bay Area, CA

PostPosted: Fri Feb 18, 2011 9:16 pm    Post subject: Reply with quote

Urisma wrote:
footballtom: Wouldn't that still make all the substitutions the same? The message is 444 characters (I think). Multiplying all the percents by 444, and then finding the least difference between counts would still net you all the same substitutions, unless I'm misinterpreting your statement. That method increases the accuracy of each by a small amount, and might help encrypted letters that pop up maybe once or twice, but overall I don't think it would help.
I think you're misinterpreting what I meant. My method would be to just rank every letter by how many times it appears in the text. You would also rank the percentages, so at 12.7% E would be number 1. The letter with the highest count would then be E, and so on.

I interpreted FBMRider's original method as finding percentages and rather than going by a ranked ordering trying to substitute based on the difference in percentage. (i.e. if he got that there were 8.2% Z's, he would substitute A for Z because the research percentage says that A appears 8.2% of the time)

Hope that makes sense, but anyway it seems like he needs to do something different, and I've never heard of digram/trigram analysis so I unfortunately have nothing else to add
Back to top
View user's profile Wiki User Page Send private message Send e-mail
bclare  





Joined: 21 Jun 2008
Posts: 6048
Location: Boston

PostPosted: Fri Feb 18, 2011 9:45 pm    Post subject: Reply with quote

footballtom3685 wrote:
Urisma wrote:
footballtom: Wouldn't that still make all the substitutions the same? The message is 444 characters (I think). Multiplying all the percents by 444, and then finding the least difference between counts would still net you all the same substitutions, unless I'm misinterpreting your statement. That method increases the accuracy of each by a small amount, and might help encrypted letters that pop up maybe once or twice, but overall I don't think it would help.
I think you're misinterpreting what I meant. My method would be to just rank every letter by how many times it appears in the text. You would also rank the percentages, so at 12.7% E would be number 1. The letter with the highest count would then be E, and so on.

I interpreted FBMRider's original method as finding percentages and rather than going by a ranked ordering trying to substitute based on the difference in percentage. (i.e. if he got that there were 8.2% Z's, he would substitute A for Z because the research percentage says that A appears 8.2% of the time)


I think I see what you're saying, but the difference in your proposed substitutions would be minimal. At any rate, it's very unlikely that a particular natural language passage would have the exact same letter frequencies either by percent or even ordinally (ranked), so neither method would be perfect, they'd just be wrong in slightly different ways.
_________________
I'm back I suppose
Back to top
View user's profile Wiki User Page Send private message XBL Gamertag: bclare PSN Name: bclare1729
footballtom3685  





Joined: 16 Sep 2007
Posts: 2478
Location: Bay Area, CA

PostPosted: Fri Feb 18, 2011 11:01 pm    Post subject: Reply with quote

Yeah, I said in my first post that it probably wouldn't help much, if at all; it was just my initial thought and I had a feeling it might be a bit better than his original method. I do think it would be fairly reliable for the highest and lowest percentage letters, though. Now I almost want to try this on my own...but knowing nothing about the actual way to do it, it seems silly to even try.
Back to top
View user's profile Wiki User Page Send private message Send e-mail
IndestructibleSD  





Joined: 17 Jul 2008
Posts: 1382
Location: Boston, MA

PostPosted: Fri Feb 18, 2011 11:40 pm    Post subject: Reply with quote

So, my teacher told us what one of our assignments will be once we get back from February break and I wanted to get a head start on it this afternoon just because the problem seemed interesting. I toyed around with a few ideas and this is what I came up with. I just want to know what you all think about it.

The problem was: design a program that requests 10 integers, places them into an array and outputs the mode (integer that occurs the most times) using a method that accepts an array for its parameters.

A few of my friends said they were going to use another array as a counter of sorts, but I was unsure how you would match up the counter array with the original array since the counter array will have less values in it. For example, if your original array only contains 4 different numbers out of 10 integers, your counter array would have an index of 4, so I would be unsure how to sync those two up and make them "parallel".

Anyways, doesn't this seem like a more sound way than what my friends described?

Code:
public static int getMode(int[] TenNumbers) {

      int iMode = -1;
      int iMaxCount = 0;

      for(int j = 0; j < TenNumbers.length; j++) {
         //Reinitialize temp counter
         int iCount = 0;
         for (int k = 0; k < TenNumbers.length; k++) {
            if (TenNumbers[k] == TenNumbers[j])
            iCount++;
         }


And for some reason SH's posting system isn't accurately representing my code, so I'm going to try splitting it up.

Code:
//Store the max counter value achieved
      if (iCount > iMaxCount) {
         iMode = TenNumbers[j];
         iMaxCount = iCount;
         }
      }

      return iMode;
}


Not sure why my code was getting drastically modified when it was grouped together under the same code display, but oh well. I hope you get the general idea of what my method accomplishes.
_________________
dbforthree, Expert Vocalist PS4
Back to top
View user's profile Send private message Visit poster's website PSN Name: dbforthree
FBMrider86  





Joined: 23 Jan 2007
Posts: 1679
Location: Lawrenceburg, KY <- And I ain't havin' no fun

PostPosted: Sat Feb 19, 2011 1:05 am    Post subject: Reply with quote

Yours looks fine, however it won't cover occasions where there's more than one mode. You would need an array for that, but at its biggest it would only have to be half of the total number of integers in the original array.
_________________
Quote:
I think my neighbors have figured out my address. Should I move or kill them?

Custom Guitar Paint Jobs (Submit Yours!)
Proud supporter of EWiggen's Grilled Cheese and GH Combo
socrstopr wrote:
This thread is very disturburbing. Also terrible.
Back to top
View user's profile Send private message MSN Messenger XBL Gamertag: Head0nFire86
IndestructibleSD  





Joined: 17 Jul 2008
Posts: 1382
Location: Boston, MA

PostPosted: Sat Feb 19, 2011 1:40 am    Post subject: Reply with quote

FBMrider86 wrote:
Yours looks fine, however it won't cover occasions where there's more than one mode. You would need an array for that, but at its biggest it would only have to be half of the total number of integers in the original array.

Yeah, I was thinking about that. How would using an extra array help you in cases where you had more than one mode? I'm not doubting you, I'm just curious as to how it would work.
_________________
dbforthree, Expert Vocalist PS4
Back to top
View user's profile Send private message Visit poster's website PSN Name: dbforthree
FBMrider86  





Joined: 23 Jan 2007
Posts: 1679
Location: Lawrenceburg, KY <- And I ain't havin' no fun

PostPosted: Sat Feb 19, 2011 2:20 am    Post subject: Reply with quote

IndestructibleSD wrote:
FBMrider86 wrote:
Yours looks fine, however it won't cover occasions where there's more than one mode. You would need an array for that, but at its biggest it would only have to be half of the total number of integers in the original array.

Yeah, I was thinking about that. How would using an extra array help you in cases where you had more than one mode? I'm not doubting you, I'm just curious as to how it would work.


After thinking about it, a second array isn't a bad idea period. The way it would work is:
Code:
Pseudo-code!
read array of integers
check each integer in the array for multiples and log the number of times they occur
create a 2d array (matrix), with the left column storing the integers and the right column storing the number of times they occur

find the largest value in the second column of the 2d array (the row number in this case), and display that row number, column 1.

1 2 3 1 2 3 1 2 3 0
[1, 3] <- find 3, display 1 \
[2, 3] <- find 3, display 2  --- the modes
[3, 3] <- find 3, display 3 /
[0, 1] <- find 1, don't display 0

sorry if that doesn't make sense, i'm doing some intense multitasking right now (working on the problem I posted, talking to the g/f, and replying to a fellow student's answer on Blackboard).
_________________
Quote:
I think my neighbors have figured out my address. Should I move or kill them?

Custom Guitar Paint Jobs (Submit Yours!)
Proud supporter of EWiggen's Grilled Cheese and GH Combo
socrstopr wrote:
This thread is very disturburbing. Also terrible.
Back to top
View user's profile Send private message MSN Messenger XBL Gamertag: Head0nFire86
IndestructibleSD  





Joined: 17 Jul 2008
Posts: 1382
Location: Boston, MA

PostPosted: Sat Feb 19, 2011 5:43 pm    Post subject: Reply with quote

FBMrider86 wrote:
Yours looks fine, however it won't cover occasions where there's more than one mode. You would need an array for that, but at its biggest it would only have to be half of the total number of integers in the original array.

Just re-read this, and couldn't there be instances where the second array was the exact same size as the first array? If all of the numbers only appeared once, then they're all the mode of the original array, so the second array would be of equal size.
_________________
dbforthree, Expert Vocalist PS4
Back to top
View user's profile Send private message Visit poster's website PSN Name: dbforthree
FBMrider86  





Joined: 23 Jan 2007
Posts: 1679
Location: Lawrenceburg, KY <- And I ain't havin' no fun

PostPosted: Sat Feb 19, 2011 6:50 pm    Post subject: Reply with quote

IndestructibleSD wrote:
FBMrider86 wrote:
Yours looks fine, however it won't cover occasions where there's more than one mode. You would need an array for that, but at its biggest it would only have to be half of the total number of integers in the original array.

Just re-read this, and couldn't there be instances where the second array was the exact same size as the first array? If all of the numbers only appeared once, then they're all the mode of the original array, so the second array would be of equal size.

Indeed, but in this case you could have an if-statement before your loop checking to see if they're all different, then you won't have to create a new array and just list the entire original array.
_________________
Quote:
I think my neighbors have figured out my address. Should I move or kill them?

Custom Guitar Paint Jobs (Submit Yours!)
Proud supporter of EWiggen's Grilled Cheese and GH Combo
socrstopr wrote:
This thread is very disturburbing. Also terrible.
Back to top
View user's profile Send private message MSN Messenger XBL Gamertag: Head0nFire86
IndestructibleSD  





Joined: 17 Jul 2008
Posts: 1382
Location: Boston, MA

PostPosted: Sat Feb 19, 2011 7:36 pm    Post subject: Reply with quote

I guess the difficulty I'm encountering is determining how I'm going to filter out duplicate values to make my 2D array that will be in the format you suggested, FBM.

I've looked up tons of things online and filtering arrays just seems to be a topic people either don't know how to do or are having the same trouble as me.
_________________
dbforthree, Expert Vocalist PS4
Back to top
View user's profile Send private message Visit poster's website PSN Name: dbforthree
Bj0rn  





Joined: 31 Aug 2007
Posts: 522
Location: Stockholm, Sweden

PostPosted: Sun Feb 20, 2011 4:45 am    Post subject: Reply with quote

Well there are many ways to solve that problem. The code posted above is one way, but it's slow. When the input is bounded at 10 integers, it's absolutely fine. But if that were not the case, the algorithm takes quadratic time, i.e. for ever number you check it against all numbers. So if there are 20 integers in the input, you will do 20^2 = 400 checks.

Another way that would be a lot faster on bigger inputs is to sort the array. Then finding the mode would be trivial. Like this:
Code:

        static int getMode(int[] TenNumbers)
        {
            List<int> sortedList = new List<int>(TenNumbers);
            sortedList.Sort();

            int max = 0;
            int mode = 0;
            int current = sortedList[0] - 1; //Or any value different from sortedList[0].
            int count = 0;
            for (int i = 0; i < sortedList.Count; i++)
            {
                if (sortedList[i] == current)
                    count++;
                else
                {
                    current = sortedList[i];
                    count = 1;
                }

                if (max < count)
                {
                    max = count;
                    mode = current;
                }
            }

            return mode;
        }

Here the sorting is the bottleneck. But a good sorting algorithm is still way faster than quadratic time.

However this problem can also be solved in linear time (with high probability) with the use of a hash table. Linear time means every integer just has to be examined once. This may be just what you were looking for with the 2D array you guys are talking about. I'll start with the code:
Code:

        static int getMode(int[] TenNumbers)
        {
            Dictionary<int_int> occurrences = new Dictionary<int_int>();
            //int_int should really be int, int, but the code tags behave weirdly.

            int max = 0;
            int mode = 0;
            foreach (int num in TenNumbers)
            {
                if (!occurrences.ContainsKey(num))
                    occurrences.Add(num, 0);
                int count = ++occurrences[num];

                if (max < count)
                {
                    max = count;
                    mode = num;
                }
            }

            return mode;
        }

In Visual C# (or any .NET language) Dictionary is a hash table. I think it's called HashMap in java and the syntax may be a bit different. But the idea with a hash table is to associate a key with a value. In this case the key is an integer from the input array and the value is how many times it has occurred in the array. It follows the same concept of your "2d array" in that it is an associative array. The hash table just happens to be an implementation with excellent performance. So if I had to pick one solution for this problem, it would definitely be this one. It's super fast no matter how many integers you throw at it (unless you systematically pick values that collide in the hash table) and it's even the simplest looking algorithm.

Now if you want to return an array with all mode values (in case there are more than one), just a bit of tweaking is needed regardless of which algorithm was used
_________________
My accomplishment thread
Champion of 6 ScoreHero leagues:
S7: GH3X AAA-2
S10: GH3X AAA-1, GHAX
S11: GH3X AAA, GH5X AAA
S16: GH3X
My youtube channel
Back to top
View user's profile Send private message XBL Gamertag: BurningThunder
IndestructibleSD  





Joined: 17 Jul 2008
Posts: 1382
Location: Boston, MA

PostPosted: Sun Feb 20, 2011 6:37 am    Post subject: Reply with quote

Yeah, lists and hash tables are far beyond anything we've learned in my AP Computer Science class. I highly doubt my teacher wants us to account for more than one mode, and if he does, it would probably be for some extra credit.

In other news, my median method worked beautifully! I used the Arrays.sort method (from java.util.Arrays) to make the process much easier. I'm satisfied with my method for calculating the mode for now, but perhaps I'll revisit it once I learn about the methods you suggested, Bj0rn. Thanks for your help everyone, I'll probably be back, though, since I've been doing tons of coding for fun recently.
_________________
dbforthree, Expert Vocalist PS4
Back to top
View user's profile Send private message Visit poster's website PSN Name: dbforthree
Robbert  





Joined: 21 Mar 2007
Posts: 373
Location: Netherlands

PostPosted: Mon Feb 21, 2011 2:31 pm    Post subject: Reply with quote

Bj0rn wrote:

However this problem can also be solved in linear time (with high probability) with the use of a hash table. Linear time means every integer just has to be examined once. This may be just what you were looking for with the 2D array you guys are talking about. I'll start with the code:
Code:

        static int getMode(int[] TenNumbers)
        {
            Dictionary<int_int> occurrences = new Dictionary<int_int>();
            //int_int should really be int, int, but the code tags behave weirdly.

            int max = 0;
            int mode = 0;
            foreach (int num in TenNumbers)
            {
                if (!occurrences.ContainsKey(num))
                    occurrences.Add(num, 0);
                int count = ++occurrences[num];

                if (max < count)
                {
                    max = count;
                    mode = num;
                }
            }

            return mode;
        }

In Visual C# (or any .NET language) Dictionary is a hash table. I think it's called HashMap in java and the syntax may be a bit different. But the idea with a hash table is to associate a key with a value. In this case the key is an integer from the input array and the value is how many times it has occurred in the array. It follows the same concept of your "2d array" in that it is an associative array. The hash table just happens to be an implementation with excellent performance. So if I had to pick one solution for this problem, it would definitely be this one. It's super fast no matter how many integers you throw at it (unless you systematically pick values that collide in the hash table) and it's even the simplest looking algorithm.

Now if you want to return an array with all mode values (in case there are more than one), just a bit of tweaking is needed regardless of which algorithm was used


This would definitely be my solution, as well. Scalable and fast and easy to write and since the key is a primitive type, your lookup speed will almost always be O(1), i.e. constant, even in worst-case scenarions.
_________________


Back to top
View user's profile Wiki User Page Send private message XBL Gamertag: UpgradePolecat
Display posts from previous:   
Post new topic   Reply to topic    ScoreHero Forum Index -> General Chat All times are GMT
Goto page Previous  1, 2, 3 ... 16, 17, 18 ... 28, 29, 30  Next
Page 17 of 30

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Copyright © 2006-2024 ScoreHero, LLC
Terms of Use | Privacy Policy


Powered by phpBB